GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion

Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.


Problem definition
The drug-target binding affinity (DTA) problem aims to predict the binding affinity between a drug and a target protein.This is a mathematical regression problem: where D = {d 1 , d 2 , d 3 , . . ., d i } , P = {p 1 , p 2 , p 3 , . . ., p i } , and θ is a learnable parameter in the prediction model F .Our task is to predict the affinity score between t i and D or T and d j , given a new drug t i and target protein d j .

Dataset
We evaluated our proposed model on two different datasets, the kinase dataset Davis 31 and the KIBA dataset 32 , both of which have been used as gold standard datasets for prediction assessment in DTI and DTA studies 14,33 .
The Davis dataset contains selective assays of kinase protein families, related inhibitors, and their respective dissociation constant ( K d ) values.It contains the interactions of 442 proteins and 68 ligands.On the other hand, the KIBA dataset was derived from a method called KIBA, which combines the biological activities of kinase inhibitors from different sources (e.g., K i , K d , and IC50) 32 .The study of predicting these kinase inhibitors can be explored through 34 .KIBA scores were constructed to optimize the concordance between K i , K d , and IC50 by exploiting the statistical information they contain.The KIBA dataset initially contained 467 targets and 52498 drugs 14 .Removing these drugs and targets can mitigate the impact of noise on model training, balancing the dataset and preventing an undue focus on specific drugs and targets during the model training process.Tables 1 summarizes these datasets we used in our experiments.To demonstrate the properties of the drugs and proteins more visually in Table 1, we depict the breadth and length of the two gold standard data through Fig. 1.
Regarding data density, the model performs well in handling sparse graphs, considering only the immediate neighbors of nodes.Therefore, the model performs better when dealing with the low-density KIBA dataset.However, its performance is poorer in the high-density Davis dataset.Concerning data size, the model utilizes self-attention mechanisms to handle small-scale data, capturing global information about the molecular graph neighborhood and aiding in extracting key node information.However, when dealing with large-scale data, the model has longer training cycles.
While 33 directly uses the K d values from the Davis dataset as binding affinity values, we employ the trans- formed values into logarithmic space, denoted as pK d , similar to the equation (2) described.

Drug representation
In the dataset, the pairs of affinity primarily consist of drugs and proteins.The input for drug compounds mainly utilizes two data formats: SMILES and SDF.In our proposed method, the molecular graph of a drug is constructed based on the SMILES string and SDF file data.Specifically, the SDF format molecular data is parsed using the RDKit tool 35 to obtain the two-dimensional structural information of the molecule.In the molecular graph representation, atoms represent the nodes of the graph.The combination of node features encompasses a variety of properties, including atom symbol, atom degree, atom implicit valence, the number of free valence electrons, atom hybridization type, and atom aromaticity.These attribute features are concatenated to form a multidimensional feature.The edges in the graph represent the chemical bonds of the molecule, and the presence or absence of an edge between two nodes indicates whether there is an interaction between the atoms.We construct an adjacency matrix based on these edges, which encapsulates the positional information of the node with respect to other nodes.In our study, we use G d = (V d , E d ) to represent the graph representation of the drug compound, where V d represents the atoms of the drug compound, and E d represents the chemical bonds of the drug compound.
We define the set of attributes of atom j of the i-th drug d i in the entire drug set D of the database as x d i j , which is a vector of nine attributes, denoted as follows: where x d i j represents the mathematical expression of atom j of drug d i , a 1 represents the number of atoms in drug d i , a 2 represents chiral information including R-type, S-type, axial chirality, planar chirality, and helical chirality, and [a 3 , a 4 , . . ., a 9 ] represents, in order, the atomic degree (number of chemical bonds), formal charge, number of connected hydrogen atoms, free radical number of electrons, type of atomic hybridization, whether or not an aromatic bond is formed, and whether or not an a-ring is present.x d i j in these properties can be obtained by the RDKit tool and embedded as integers under the guidance of a predefined dictionary.

Degree centrality encoding
We first extracted the atomic and chemical bonding information of the drug using the RDKit tool 35,36 .The more edges an atom exists, the more critical the atom becomes, or the more complex the interconnections with other atoms are to the model.In this paper, we characterize the degree features in the molecular graph by atomic degree centrality as an additional signal for the neural network.Since the degree centrality habit encoding (see Fig. 2) is used for each node, we only need to combine it with the atomic node corpora to form the degree centrality features of the atoms.This encoding allows the model to capture the semantic relevance and importance of the atoms more confidently and pass them into the attention mechanism, as shown in the following mathematical equation: where e − , e + ∈ R d denote the incoming and outgoing degrees of atomic nodes specifying the learnable embedding vectors, respectively, Additionally, h denotes the atomic features of atom j in drug d i .Here, d denotes the modulation factor, and W Q and W K are the weight matrices for atoms (nodes) i and j, respectively.
For undirected graphs, the incoming degree deg − v j and outgoing degree deg + v j can be uniformly denoted as deg(v j ) .By adding the degree-centric encoding feature to the nodes, softmax attention can capture the critical information of the nodes in K and Q.Therefore, the model can capture the semantic relevance and the critical information of the nodes in the attention mechanism.

Atomic spatial position encoding
The Transformer possesses globality, but it relies too heavily on positional information for encoding.When solving sequential data present in natural language problems, it is possible to encode each position (i.e., absolute position encoding) 37,38 or to encode any two positions in the Transformer layer (i.e., relative position encoding) 39 .
(2) www.nature.com/scientificreports/However, when we use the graph information built based on the spatial structure as the input to the Transformer model, it is instead detrimental to the prediction of the model.We introduce the spatial location encoding to capture the spatial structure information of the drug graph.First, we write down the set of drug nodes as We describe the function φ(v i , v j ) as a connectivity definition graph between nodes.In the drug diagram, we set the pathway φ(v i , v j ) ∈ R N between v i and v j to denote where SPD(v i , v j ) denotes the shortest dependency path (SDP) reachable between v i and v j .
After we encode by degree center and spatial location, we obtain the embedding matrix of the atomic pair (node pair) ( v i , v j ) as where W (i,j) φ is the weight of the spatial location feature of the drug node pair, and Featrue p ij is the embedding of the spatial structure feature.

Interatomic chemical bonding coding
Edges are also an important component in handling graph tasks.For example, in molecular graphs of drug compounds, features describing the types of chemical bonds can be assigned to atom pairs.These features are as crucial as node features in representing the graph and are indispensable for encoding in graph tasks.Previous approaches to graph tasks mainly include two methods: (1) Edge features are added to the associated node features 40 .(2) For each node, the features of its associated edges are used together with the aggregated node features 41 .However, these approaches only propagate edge information to their associated (neighbor) nodes, which may not effectively utilize edge information to represent the entire graph.
We introduce atomic compound chemical bond encoding to encode edge features into the attention layer better.For the adjacent atom-pairs edge encoding approach is defined: where b 1 denotes the bond type, b 2 denotes the steric bond, and b 3 denotes whether the bond is conjugate.b 1 , b 2 and b 3 can be obtained by the RDKit tool.If the shortest path of i and j is P = (e 1 , e 2 , . . ., e k ) , then

Protein representation
Previous studies 25,42 typically used protein sequences as input for deep learning models, where protein residues were encoded into a vector space using techniques like one-hot encoding or BPE encoding.These studies employed a lightweight 1D convolutional layer encoder to extract valuable features from the protein.However, these methods solely captured the primary structure information of proteins.Predicting the 3D structure from a 1D sequence is a formidable task, making 1D representations inadequate for capturing the spatial structural features of proteins.Obtaining 3D structures for certain proteins is challenging due to their limited representation in databases 43 .Moreover, representing the irregular 3D structure requires a large-scale 3D matrix, resulting in computationally expensive model execution.Additionally, experimentally determined 3D structures may suffer from low quality since they depend on the intricate and demanding process of co-crystallization of protein-ligand pairs.Hence, it is necessary to shift our focus towards the secondary structure and other protein information.( 6) Node Feature Linear Linear To tackle the complexity and accessibility challenges, we employ SS and ASA 44 for representing the protein graph structure.SS determines the backbone structure of the target protein, while ASA indicates the degree of contact or exposure of amino acid residues to the solvent in its three-dimensional structure.The interaction between nonadjacent residues is denoted as DM, which serves as a protein feature.The pairwise distance matrix of residues efficiently captures contact information in the protein structure and can be calculated using SPOT-Contact 45 .DM has proven successful in predicting various protein spectra, such as solubility 46 , DTI 47 and DTA.Contact between two non-adjacent residues occurs when their distance is less than 8 Å .However, simply vectorizing each residue in the protein sequence using unique thermal encoding lacks information about element similarity and treats them as equal in distance.This representation also limits the model learning capability by disregarding the dependency information between residues.In many protein datasets, only a limited number of target proteins provide available information, while most of the protein information remains untapped, leading to detrimental DTA prediction results.
The TAPE 48 approach utilizes amino acid embeddings in a continuous vector space and employs the self-attention mechanism of the Transformer to capture contextual relationships and information in protein sequences.Instead of one-hot encoding, TAPE uses embedded representations learned from unlabeled protein sequences to represent protein graph nodes.Fusion of embedding vectors from TAPE, secondary structure, and solvent accessibility feature vectors represents node features in the protein graph (see Fig. 3).Each amino acid residue is assigned to one of eight categories, providing detailed secondary structure information.Given a protein sequence of M residues, the node feature set i=l , where h is the length of the embedding vector v i provided by TAP, captures context-dependent residues.Protein secondary structure, formed by coiled folding of peptide chains, contains vital information about protein activity, function, and stability, benefiting model predictions.Distance map as global structure information may be important in future DTA identification. 47introduced super nodes connecting other nodes in the composite structure graph.

Proposed model
The general architecture of our proposed method is shown in Fig. 4. Our GEFormerDTA takes the drug molecule graph structure G d and the target protein graph structure G p as inputs and outputs the final prediction results.In processing the graph structure information, we use a graph convolutional neural network model (GCN).Our GEFormerDTA model consists of five main key steps: information preprocessing (Fig. 4a), drug ESC encoding (Fig. 4b), drug Graph encoding (Fig. 4c), drug-target protein graph early fusion (Fig. 4d), drug-target protein graph refinement (Fig. 4e) and affinity scoring (Fig. 4f).In the steps of Fig. 4b,c,e, we also added residual jumps to slow down the generalization performance of our network.

GEFormerDTA overview
Before we input the drug into the GEFormerDTA model, we need to encode the drug by two types of encoders: (1) ESC encoder; (2) Mol.encoder.For the ESC encoder, we mainly use the global sensory field of the Transformer to capture the global information of the drug molecule, while the Mol.encoder captures the main node information in the drug graph information.Meanwhile, we fuse the obtained protein feature maps with the drug feature maps extracted by the Mol.encoder features.The fused drug-protein fusion map is fed to the drug target protein fractionation process to obtain the fractionated drug-protein map, and finally the results are obtained by DTA prediction.

ESC encoder
As shown in Fig. 2, after obtaining the node features, spatial position features, and edge features of the molecular graph, if we use traditional attention models, we will face the challenge of high dimensionality and many molecular nodes, which seriously affects the efficiency of model training.In addition, to address the issue of memory overhead, we introduce the Sparsepro self-attention molecular graph encoder to extract important Q and reduce model complexity.Meanwhile, we use self-attention distillation to reduce feature dimensionality and the number of network parameters.As shown in Fig. 4b, our drug molecule encoder is a sandwich model that includes 3 layers of Sparsepro self-attention and 2 layers of GCN.Our Sparsepro self-attention can attach great Similarity Score www.nature.com/scientificreports/importance to atoms or edge matrices that make significant contributions, while ignoring others.Sparsepro self-attention can be expressed by the following mathematical formula: where Q is a sparse matrix of the same size as Q, which contains only top-S queries.We compute all queries in Q and sort them based on the sparsity of KL scattered points 49 .This paper adopts S=25 to form Q and replace Q.
The time complexity of point-wise computation in Sparsepro self-attention is O lnL Q , and the memory usage for each Q-K lookup and each block is O L K lnL Q 49 .After improving Formula (10), we obtain the following expression: After inputting all the features of the drug molecule graph into the model, we employ an expression to calculate the self-attentiveness of Sparsepro is In addition, we set a GCN distillation operation immediately after each Sparsepro self-attentive block to prioritize mappings with focal features and capture the focal feature map as input at the next layer.The specific operation flow equation is as follows: where [•] ops denotes the output of Sparsepro self-attentive block after having some column operations, X j denotes the input of the j-th self-attentive block, Conv1d denotes the 1D convolutional layer, ELU is the activation function, and MaxPool is the maximum pooling layer.We need to transform the SMILES sequence of the drug into a 2D structure by scripting before inputting the drug into the GEFormerDTA model, and then we extract the atomic structure information from the 2D structure information of the drug, after which we convert the atomic information into an information encoding that can be applied to the attention mechanism by 50 three encoding designs.

Mol. encoder
For the accuracy of model prediction, we also leverage the graph information of drug molecules as inputs to the model.This approach differs from the treatment of drug data mentioned in 2.5.2,where the atomic features of the drugs (element types, atomic degrees, atomic indices, atomic implicit valence, formal charge, hybridization types) are directly fed into the Mol.encoder.
Due to the strong affinity of GCN networks for graph information, we use the GCN neural network layer as the first feature extraction network layer for drug graph information, with the mathematical expression given by where H i represents the feature matrix of the molecular graph G d = (V d , E d ) for the drug, where A (N×N) denotes the adjacency matrix.
represents the symmetric normalization of the adjacency matrix, where A = A + I N , introducing self-loops to the nodes by adding the identity matrix I N , ensuring that node features are included during convolution operations.D = j A ij is a degree matrix used for normalizing A to prevent the occurrence of gradient explosions.W (i) and W (i−1) represent the weight matrices of the current layer and the previous layer, respectively.σ (•) is the ReLU activation function.Subsequently, the graph information extracted from GCN is distilled through multiple residual processes to obtain the refined feature representation of the drug molecule.In mathematical terms, the residual operation is defined as After that, to reduce the network complexity and improve the training accuracy, we use the graph pooling layer to scale down the redundant information.Finally, after the 2-layer linear layer output of the Mol.encoder, we obtain the feature representation of the drug.The mathematical formulas for the two-step operations are as follows where V ′ d represents the node features of the drug graph after the application of GCN.W i∈{0,1} and b i∈{0,1} denote the weights and biases of the two linear layers, respectively.The obtained vector x d ∈ R N ′ is referred to as the drug molecule node, where N ′ is the dimensionality of x d .

DTG distillation
After encoding through the Mol.encoder, a new drug graph , and a protein graph G p = (V p , E p ) .The feature fusion of these graphs forms a heterogeneous graph, resulting in an information-rich pool G DTG = (V DTG , E DTG ) , where V DTG = concat(x d , V p ) and E DTG = concat(E ′ d , E p ) .The data in these information pools are high-dimensional and redundant.To streamline our data dimensions and expedite model training, the DTG in the information pool will utilize GCN to capture essential feature information.Mathematically, the expression is obtained by Then, the DTG is subjected to dimensionality reduction using residual blocks, resulting in the refined drugprotein hetero-network.Mathematically, the expression is as follows Finally, we separate the refined bipartite graph into drug and protein graphs using a masking approach.Mathematically, this is expressed by ( 14) .The mathematical expressions are given by To improve predictive accuracy, we combine the drug features before feature fusion with those obtained after the separation of the bipartite graph.This integration results in a new set of drug features.Subsequently, we employ a fully connected block to concatenate these drug features with protein features for the prediction of protein-drug affinity values.Mathematically, the expression is formulated as

Evaluation indicators
Many metrics exist for assessing model performance and capacity in current research in the DTA/DTI field.However, the selection of different metrics for different research questions with different contextual information often leads to different measures.Therefore, we use mean squared error (MSE), root mean square error (RMSE), Pearson, Spearman, consistency index (CI) 51 and r 2 (coefficient of determination) to assess the performance of our models.
MSE: MSE is used to measure the squared average difference between the model's predicted values and the actual observed values.For a set of actual observed values (or target values) y i and their corresponding predicted values (or model outputs) y i , the calculation of MSE is as follows: RMSE: A measure of the square root of the mean squared difference between the predicted and actual values.
Pearson: Measures the linear correlation between the predicted value X and the underlying true value Y.
where, cov(X,Y) is the covariance between the predicted value and the underlying fact, σ (X) is the standard deviation of X, and σ (Y ) is the standard deviation of Y. µ X , µ Y are the mean values of the distributions of X,Y, respectively.
Spearman: A statistic obtained by arranging the sample values of two random variables in order of their data magnitude, using the ranks of the individual sample values instead of the actual data.
where R(ŷ i ) is the predicted value ranking, R(y i ) is the true value ranking, R(ŷ) is the average of the predicted value ranking, and R(y) is the average of the true value ranking, CI: Measures the probability of correctly predicting unequal pairs according to the order.
where x i is the predicted value of the larger affinity δ i , x j is the predicted value of the smaller affinity δ j , Z is the number of unequal pairs as the normalization constant, and h(x) is the step function 33 : This metric measures whether the predicted binding affinity values for any drug-target pair are predicted in the same order as their true values.We used paired t-tests to perform statistical significance tests with 95% confidence intervals.r 2 : Given the varying scales of different datasets, it's challenging to compare them using metrics like MSE and RMSE mentioned above.This metric calculates the R 2 value with a reference to the mean model for comparing the quality of models.The formula for calculating the r 2 is as follows: where ŷi is the predicted value, y i the real value, and y the mean of the real values.

Experiment setup
We evaluate the performance of our proposed model on benchmark datasets 31,32 .We will use the same nested cross-validation as the DeepDTA 21 method to determine the best parameters for the validation and test sets.To train the generalized linear model with enhanced generalization, we randomly partition the dataset into 6 equal parts (4:1:1), designating one part as the independent test set.The remaining parts are utilized for hyperparameter tuning through 5-fold cross-validation.We conducted special processing for the KIBA dataset.To accelerate model training, we divided the KIBA dataset into four parts and trained each of the four subsets with identical parameters.KronRLS 33 , Simboost 14 , and others use folds with the same settings as the training, test, and validation sets for a fair comparison.
We set different filter sizes for drug compounds and proteins instead of generic sizes for the experiments because they have different contextual representations.In Table 2, the hyperparameter combination corresponding to the best CI score provided on the validation set is selected as the best hyperparameter combination for modeling the test set.

Comparison experiments
In Tables 3 and 4, KronRLS, SimBoost, DeepDTA, and DeepCDA are mainly based on token-based SMILES representations and token-based FASTA sequence representations, while GraphDTA-GCNNet, GraphDTA-GINNet, GLFA, and GEFA are mainly base on representations of drug graphs or protein graphs.
In Table 3, We report some work on Transformer graph early fusion methods on the benchmark datasets Davis and KIBA.Our proposed method achieves the best performance among all listed methods, which is in line with our expectations.To validate the validity and feasibility of the GEFormerDTA method, we evaluated and compared the predictive accuracy of different state-of-the-art binding affinity regression models.The performance of the GEFormerDTA model compared with existing baseline models on the Davis independent test set is depicted in Table 3.The proposed method achieved good results in three of the six metrics.The change in the CI metric is less significant compared to the best-performing existing methods, showing an improvement of only 0.4 percentage points.The Pearson correlation coefficient and r 2 value increased by 3.2 and 2.3% points, respectively.Our ESC drug encoder fully uses information such as atomicity center encoding, chemical bond encoding, and spatial information encoding in the drug feature map.MSE, RMSE, and Pearson did not yield satisfactory results, being 1.7, 1.8, and 15 percentage points lower than the optimal performance across all baselines, respectively.Transformer has global information awareness, which is very beneficial to obtain complete (34)   www.nature.com/scientificreports/drug features containing richer informa-tion than GCN.This also demonstrates the advantage of applying a Transformer to graph problems.Table 4 compares the performance of the GEFormerDTA model with the existing baseline model using the KIBA independent test set.We conducted experiments with our model on four subsets of the KIBA dataset.The proposed method showed good performance in the split3 subset, achieving strong results across four metrics (MSE = 0.06, RMSE = 0.244, Spearman = 0.884, r 2 = 0.898).The CI metric performed best in the split2 subset with a value of 0.896.Our model, GraphDTA-GINNet, achieved the best result in the Pearson metric, with a score of 0.872.Compared to the highest levels of existing methods, the change in the Pearson metric is minimal, with an improvement of only 0.16% points in the split1 subset.The maximum improvement in the r 2 metric, when compared to other models, is 4.5% points.In Table 4, GEFormerDTA outperforms baseline models in terms of performance, and the comparison with GEFA in Table 3 highlights the reliability and effectiveness of drug encoding in our method.In recent articles, CI has been used as the primary evaluation metric in models.Although we did not achieve the best performance in some metrics, our model achieved the best CI on two datasets.
To visually represent the predictive performance of our model, Fig. 5a illustrates the fit of the predicted binding affinity values to the true values on the Davis dataset.The scatter plot shows that data points are distributed on both sides of the line ŷ = y , indicating a reasonable fit. Figure 5b displays the kernel density estimates of the predicted binding affinity values compared to the true values.The dense distribution of curves suggests a high degree of data density.The circular curves generally have an oval shape, and their long axes roughly align with the curve ŷ = y.Figures 6 and 7 show the performance comparison of our method with other methods on two gold standard datasets.As can be seen from the figure, the CI metric improves on both datasets compared to the baseline model.Among the six evaluation metrics, the proposed method significantly improves Pearson on four subsets of KIBA.In contrast, on the Davis data set, the improvement of r 2 is more obvious, which shows that our model has stronger generalization ability on the Davis data set.www.nature.com/scientificreports/Ablation studies It is well known that the way drug data are encoded is important for the predictive performance of the model during the study of DTA.To verify the importance of each substructure of drug coding in the drug preprocessing stage and the effect on the model performance, we performed ablation experiments on each substructure.In Table 5, the GEFormerDTA model without encoding substructures (first three rows) all performed worse than the model with both three encoding substructures.The GEFormerDTA model without protein secondary structure and accessible surface area feature encoding (fourth row) perform worse than the model with protein structural features.This is enough to show that the protein structure has a positive effect on improving the performance of the proposed model.In order to visually represent the progress of centroid encoding, edge encoding, and spatial encoding more intuitively, we present the results from Table 5 in the form of bar charts in Fig. 8.

Conclusion
In this paper, we propose a novel deep learning approach using Transformer to solve graph structure data to solve the problem of drug affinity prediction, which can accelerate the development of physical drugs and repurposing of old drugs.After our analysis of model prediction results, we found that GEFormerDTA is very effective in

Figure 1 .
Figure 1.Summary of the Davis (left panel) and KIBA (right panel) datasets.(A) Distribution of binding affinity values.(B) Length distribution of SMILES strings.(C) The number of atoms of drug molecules.(D) Length distribution of protein sequences.

Figure 2 .
Figure 2. Diagrammatic representation of centrality coding, spatial coding and edge coding used for the structure of drug molecules.

Figure 3 .
Figure 3. Summary of protein features that can be used to study drug target interaction affinity.

Figure 4 .
Figure 4. Diagram of the proposed model architecture.(a) is the data pre-processing stage of the proposed model.(b) is the encoder of the drug ESC.(c) is the encoder of the drug graph.(d) is our proposed graph feature early fusion process.(e) is the drug-target protein graph refinement process.(f) is our DTA final prediction process.
https://doi.org/10.1038/s41598-024-57879-1www.nature.com/scientificreports/where V masked d and V masked p represent the separated sets of drug nodes and protein nodes, respectively.DTA score At the final stage of the model, the separated bipartite graphs flow into their respective data channels, resulting in the drug representation X (final+1) d and the protein representation X (final+1) p

Figure 5 .Figure 6 .
Figure 5. (a) Linear regression fitted straight lines for true and predicted values on the Davis dataset.(b) Kernel density estimation plots of the true and predicted values on the Davis data set.where the horizontal coordinates indicate the true binding affinity, and the vertical coordinates indicate the predicted binding affinity.The upper and right bars show the distribution characteristics of the sample size.

Figure 7 .
Figure 7.Comparison of the levels of our method and other methods on the KIBA dataset under the sixevaluation metrics.

Table 3 .
Predicted binding affinity for the Davis independent test set ("underlined" means suboptimal; "bolded" means optimal).* Reference original data.

Table 5 .
Ablation experiments based on drug coding modalities in the Davis independent prediction dataset ("underlined" means suboptimal; "bolded" means optimal).Comparison of the levels of our method and other methods on the Davis dataset under the sixevaluation metrics.